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1. 


Introduction 


The  techniques  of  Gaussian  and  Gauss-Jordan  elimination,  originally 
devised  to  solve  systems  of  equations  over  the  real  numbers,  have  been 
repeatedly  rediscovered  and  applied  to  other  problems.  These  include  shortest 
path  problems  [6,10,16],  path-finding  problems  [4],  global  flow  analysis 
[2,12,15,23],  and  conversion  of  finite  automata  to  regular  expressions  [l8] . 
The  most  fundamental  of  these  problems  is  the  (single  source)  path 
expression  problem:  Given  a graph  G = (V, E)  and  a distinguished 
source  vertex  s , find  a regular  expression  P(s, v)  for  each  vertex  v 
which  represents  all  paths  from  s to  v in  G . By  reinterpreting 
the  u , * , and  * operations  used  to  construct  regular  expressions, 
we  can  use  a solution  to  the  single-source  path  expression  problem  to 
solve  other  kinds  of  path  problems,  including  those  mentioned  above  [30]. 

We  thus  obtain  a general-purpose  algorithm  for  solving  any  path  problem 
on  a given  graph. 

This  paper  describes  a decomposition  method  for  computing  path 
expressions.  The  method  divides  the  graph  G into  components  based 
upon  the  dominator  tree  of  G , computes  a path  expression  for  each 
component  by  Gaussian  elimination,  and  combines  the  solutions  using 
an  algorithm  for  evaluating  functions  defined  on  trees  [ 9,29].  The 
algorithm  requires  0(ma(m,n))  time  plus  time  to  compute  path  expressions 
within  the  components,  where  n is  the  number  of  vertices  in  G , 
m is  the  number  of  edges  in  G , and  a is  a functional  inverse  of 
Ackermann's  function.  If  G is  a reducible  flow  graph,  each  component 
of  G is  a single  vertex,  and  the  method  requires  0(m  a(m, n))  time 
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total.  Although  the  method  is  rather  complicated,  a simplified  version, 
which  runs  in  0(m  log  n)  time,  is  quite  easy  to  program  and  efficient 
in  practice. 

The  paper  contains  seven  sections.  Section  2 reviews  the  properties 
of  regular  expressions  used  in  the  following  sections.  Section  3 
reviews  standard  methods  of  numerical  linear  algebra  and  describes 
their  application  to  the  path  expression  problem.  This  section  introduces 
the  notion  of  a path  sequence  for  a graph  G and  shows  how,  given  a 
path  sequence,  one  can  solve  the  single-source  path  expression  problem 
for  any  source  in  time  proportional  to  the  length  of  the  path  sequence. 
Section  1+  presents  an  0(ma(m,n))  -time  algorithm  for  solving  a single- 
source path  problem  on  a reducible  flow  graph  if  the  source  is  the  start 
vertex  of  the  graph.  Section  5 extends  the  algorithm  so  that  it 
computes  path  sequences  for  reducible  flow  graphs.  Section  6 generalizes 
the  method  to  non-reducible  graphs.  Section  7 discusses  applications 
and  suggests  further  research  topics.  The  appendix  contains  the  basic 
graph-theoretic  terminology  used  in  the  paper.  An  earlier  and  much 
different  version  of  this  paper  appeared  as  a Stanford  technical  report  [27]. 


2.  Regular  Expressions  and  Path  Expressions. 

Let  2 be  a finite  alphabet  containing  neither  "A"  nor  "0". 

A regular  expression  over  2 is  any  expression  built  by  applying  the 
following  rules. 


(la)  "A"  and  " fi"  are  atomic  regular  expressions;  for  any  ae2  t 
" a " is  an  atomic  regular  expression. 

(lb)  If  R^  and  R^  are  regular  expressions,  then  (R^Rg)  ’ 
(VBg)  , and  (R1)  axe  compound  regular  expressions. 


In  a regular  expression,  A denotes  the  empty  string,  f)  denotes 

the  empty  set,  u denotes  set  union,  • denotes  concatenation,  and 

*/ 

* denotes  reflexive,  transitive  closure  under  concatenation.-'  Thus 
each  regular  expression  R over  2 represents  a set  ct(r)  of  strings 
over  2 defined  as  follows: 

(2a)  ct(a)  = [a]  ; cr(0)  = fi  ; <*(a)  = {a}  for  ae2  . 

(2b)  a(R1uR2)  = a(R1)Uo(R2)  = [w  | w e a(Rx)  or  we^)}  ; 

ct(r1«R2)  = a(R1)*a(R2)  = [w^  |w1eo(R1)  and  w^ofi^))  ; 

°(R*)  = U °(R)k  > where  cr(R)°  = {a}  and  ct(r)1  - a(R)i-1*a(R)  . 
k=  0 


Note  that  each  of  the  symbols  Aj0>U>*>*  stands  in  the  text  both 
for  the  symbol  itself  and  for  a string,  set,  or  operation.  We  shall 
allow  the  context  to  resolve  this  ambiguity.  Also,  we  shall  freely 
omit  parentheses  frcm  regular  expressions  when  the  meaning  is  clear; 
we  assume  the  standard  operator  precedence;  * over  • over  U . 
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The  reverse  Rr  of  a regular  expression  R is  defined  by 
(3a)  Ar  = A ; 0r  = fi  ; ar  = a for  ae  E . 

(3b)  (R1UR2)T  = R^URg  5 

(R1.R2)r  = r£.r£  ; 

= (R^)*  . 

Two  regular  expressions  R^  and  R^  are  equivalent  if  cr(R^)  = a('R^) 
A regular  expression  R is  simple  if  R = p or  R does  not  contain  p 
as  a subexpression.  We  can  transform  any  regular  expression  R into  an 
equivalent  simple  regular  expression  by  repeating  the  following 
transformations  until  none  is  applicable;  (i)  replace  any  subexpression 
of  the  form  )6*R^  or  R^*j#  by  p ; (ii)  replace  any  subexpression  of 
the  form  0+R^  or  R^+j#  by  R^  ; (iii)  replace  any  subexpression 
of  the  form  p by  A . 

A regular  expression  R is  non- redundant  if  R represents  every 
string  in  a(R)  uniquely.  We  can  make  this  definition  precise  as 
follows : 

(^a)  A > P t and  a for  each  a e £ are  non -redundant. 

(to>)  Let  R^  and  R2  be  non-redundant. 

R1UR2  is  non-redundant  if  o(R1)f|<J(R2)  = P . 

R^*R2  is  non-redundant  if  each  wecr(R^*R2)  is  uniquely 
decomposable  into  w = with  w^  e cr(R^)  and 

w2  e c(r2)  . 
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R^  is  non-redundant  if  each  we  a(R  ) is  uniquely  decomposable 

into  w = w,w^...w.  with  w.  e cr(R  ) for  1 < i < k . 

12k  i l — — 

* , . . 

Note  that  if  R is  non-redundant,  A/f  <J(E)  . 

Let  G = (V, E)  be  a directed  graph.  Any  path  in  G is  a sequence 
of  edges,  which  we  can  regard  as  a string  over  E . A path  expression  P 
of  type  (v,  w)  is  a simple  regular  expression  over  E such  that  every 
string  in  o(p)  is  a path  from  v to  w . Every  subexpression  of  a 
path  expression  is  a path  expression,  whose  type  can  be  determined  as 
follows . 

(5)  Let  P be  a path  expression  of  type  (v,  w)  . 

If  P = PjUPg  } then  P^  and  P2  are  path  expressions  of 
type  (v,w)  . 

If  P = ’ ^en  there  must  be  a unique  vertex  u such 

that  P^  is  a path  expression  of  type  (v,u)  and  P2 
is  a path  expression  of  type  (u,w)  . 

* 

If  P = P^  , then  v = w and  P^  is  a path  expression  of 
type  (v,w)  = (v,v)  . 

It  is  easy  to  verify  (4)  using  the  fact  that  P is  simple.  Note  that 
A is  a path  expression  of  type  (v,v)  for  any  v . 

In  describing  algorithms  to  compute  path  expressions  we  shall  assume 
that  each  y , • , and  * operation  requires  constant  time.  If  we 

represent  the  computed  path  expressions  by  a directed  acyclic  graph  as 
described  by  Aho  and  Ullman  [2,  pp.  418-426],  this  is  a reasonable 
assumption. 
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3 . Path  Expression  Problems  and  Path  Sequences. 

Let  G = (V,E)  be  a directed  graph.  The  single- source  path 
expression  problem  for  source  vertex  s is  the  problem  of  computing, 
for  each  vertex  veV  , a non-redundant  path  expression  P(s, v)  such 
that  o(P(s,v))  contains  all  paths  from  s to  v . The  single- sink 
path  expression  problem  for  sink  vertex  t is  the  problem  of  computing, 
for  each  vertex  veV  , a non-redundant  path  expression  P(v,t)  such 
that  a( p(v,t))  contains  all  paths  from  v to  t . The  all-pairs 
path  expression  problem  is  the  problem  of  computing,  for  all  pairs  v,weV 
a non-redundant  path  expression  P(v,w)  such  that  c(p(v, w) ) contains 
all  paths  from  v to  w . 

In  this  paper  we  develop  a way  to  solve  path  expression  problems  by 
using  Gaussian  elimination  in  combination  with  methods  for  decomposing 
G into  components.  In  this  section  we  describe  how  Gaussian  elimination 
applies  to  such  problems.  We  also  describe  a well-known  decomposition 
method  which  uses  the  strong  components  of  G . In  subsequent  sections 
we  present  a more  powerful  decomposition  method  based  upon  the 
dominator  tree  of  G . 

Gaussian  elimination  was  originally  developed  to  solve  a system  of 
linear  equations  Ax  = b , where  A is  an  nxn  matrix  of  real -valued 
coefficients,  x is  an  n x 1 vector  of  variables,  and  b is  an  n x 1 
vector  of  real-valued  constants  [II].  The  method  consists  of  two  steps. 

Step  1 (LU  decomposition).  Decompose  A into  A = LU  , where  L is 
unit  lower  triangular  and  U is  upper  triangular. 

Step  2 (Frontsolving  and  backsolving) . Solve  the  triangular  systems 
Ly  = b (frontsolving)  and  Ux  = y (backsolving). 


The  resource  requirements  of  Step  1 dominate  those  of  Step  2 and 
thus  determine  the  overall  requirements  of  the  algorithm  [5,28].  The 
method  has  several  pleasant  features,  including  its  amenability  to  an 
implementation  that  takes  advantage  of  the  sparsity  of  A , avoiding 
arithmetic  on  numbers  known  to  be  zero  [8,22].  It  is  also  possible 
to  solve  Ax  = b for  multiple  right-hand  sides  by  carrying  out  Step  1 
once  and  repeating  Step  2 for  each  value  of  b . 

We  apply  this  method  to  path  expression  problems  by  introducing  the 
notion  of  a path  sequence,  which  generalizes  Kennedy's  node  listing 
concept  [17].  A path  sequence  for  a directed  graph  G is  a sequence 
(P1,v1,w1),(P2,v2,w2),...,(P  ,v  ,w  ) such  that 

(6a)  For  1 < i < l , is  a non-redundant  path  expression  of 

type  (vi,wi)  . 

(6b)  For  1 < i < l , if  v^  = w^  then  Ac  a(p^)  . 

(6c)  For  any  non-empty  path  p in  G , there  is  a unique  sequence 

of  indices  1 < i^  < i^  < ...  < i^  < t and  a unique  partition 
of  p into  non-empty  paths  p = p , P2, • • • , Pk  such  that 
p . e cr(p  ) for  1 < j < k . 

J J 

Given  a path  sequence,  we  can  solve  the  single- source  path  expression 

*/ 

problem  for  any  source  s by  using  the  following  propagation  algorithm;—' 


*r 


We  shall  use  a syntax  resembling  Di jkstra’ s [7]  for  expressing 
algorithms . 
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procedure  SOLVE; 


Legin 

initialize: 

loop: 


P(s, s)  :=  A;  for  each  v e V- {s)  do  P(s, v)  :=  tf)  od; 
for  i :=  1 until  i do 

vi  = wi  " p(s>vi)  :=  Cp(s^vi)'i:,ii 
Q v.  ^ w.  ->  P(s,w  ) :=  [P(s,w . ) u [P(s,v  )*P  ]]  fi  od 


end  SOLVE; 


In  this  and  subsequent  algorithms,  the  square  brackets  denote  the 

following  simplification  procedure.  This  procedure,  when  applied 

recursively,  produces  regular  expressions  that  are  not,  only  simple  but  also 

* 

contain  no  subexpressions  of  the  form  A*R^  , R^'A  > or  A • 

regular  expression  procedure  [R] ; 

i£R.  E1UE2  - if  E1=  f,  □ E2  . P fi 

Or.  Rj^  - if  (Rj^  . fl)  or  (R2  . fl)  - fl  0 Rj^  = A - R2  0 R2  . A - 1^  fl 

M R = R*  if  (R,  = A)  or  (R_  = A)  ->  A fi  fi; 

Lemma  1.  Let  (p1^v1»w1), (P2,v2,w2), (P£,v£,wf)  be  a path  sequence 

for  G and  let  v be  any  vertex.  After  i iterations  of  the  loop  in 

SOLVE,  P(s,v)  is  a non-redundant  path  expression  representing  exactly  A 

(if  s = v ) and  al  1 non-empty  paths  p from  s to  v for  which  there 

is  a sequence  of  indices  1 < i^  < i2  < ...  < i^  < i and  a partition  of 

p into  p = p,  ,p0/ . . «,p.  such  that  p.  e ci(P.  ) for  1 < j < k . 

12k  J 

Proof.  Straightforward  by  induction  on  i . □ 
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Theorem  1.  Let  (P^v-^w.^),  (P2,v2,  w2), . . .,  (Pf,vf,wp  be  a path  sequence 
1 for  G and  let  v be  any  vertex.  After  execution  of  SOLVE,  P(s,v)  is 

a non-redundant  path  expression  representing  1 paths  from  s to  v . 

I 

l.  SOLVE  is  a generalization  of  the  frontsolving-backsolving  step  in 

Gaussian  elimination;  its  running  time  is  0(n+f)  . To  solve  a single- 
source path  expression  problem  on  a graph  G , we  construct  a path 
sequence  and  apply  SOLVE  once.  To  solve  an  all-pairs  path  expression 
problem,  we  construct  a path  sequence  and  apply  SOLVE  n times,  once 
for  each  possible  source.  To  solve  a single- sink  path  expression  problem, 

I* 

we  employ  the  following  theorem  to  construct  a path  sequence  for  G , 

r 

and  then  we  solve  the  corresponding  single-source  problem  on  G . 

Theorem  2.  Let  (P^ v1,w1),  (P2,v2,w2), . . .,  (Pf, v^,wf ) be  a path  sequence 

for  a graph  G.  Then  (p;,wf,v/), . . ., (p|, wg, vp), (P^,w1,v1)  is  a path 
r 

sequence  for  G . 

Proof.  Immediate.  Q 

I* 

By  Theorem  2 it  is  no  harder  to  compute  a path  sequence  for  G than 
to  compute  a path  sequence  for  G . 

We  can  construct  a path  sequence  for  an  arbitrary  graph  by  using  a 
method  analogous  to  Step  1 of  Gaussian  elimination.  The  method  is  similar 
to  KLeene's  algorithm  for  converting  a finite  automaton  into  a regular 
expression  [18],  except  that  Kleene  uses  Gauss-Jordan  elimination.  Let 
G = (V,E)  be  a directed  graph  whose  vertices  are  numbered  fran  1 to  n 
and  identified  by  number.  The  following  procedure  computes  a set  of  path 
expressions  which  when  properly  ordered  gives  a path  sequence. 
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procedure  ELIMINATE; 
begin 

initialize:  for  v :=  1 until  n do  for  w :=  1 until  n do  P(v,w)  :=  p od  od; 


for  each  e e E do  p(h(e),t(e))  :=  [P(h(e),  t(e) ) (j  e]  od; 

/v^«^x  /xx  /xx 

for  v :=  1 until  n do 

P(v,v)  :=  [P(v,v)*]; 
for  each  u > v such  that  P(u,  v)  ^ fi  do 
P(u,v)  :=  [P(u,v)*P(v,v)]; 
for  each  w > v such  that  p(v*w)  ^ p do 

P(u,w)  :=  [p(u,w)  (j  [P(u,v)*P(v, w)]]  od  od 

end  ELIMINATE; 


Lemma  2.  After  the  v -th  iteration  of  the  loop  in  ELIMINATE*  the  following 
statements  are  true. 

(i)  P(u,w)  for  u > w and  w < v is  a non-redundant  path  expression 
representing  exactly  the  paths  from  u to  w which  contain  no 
intermediate  vertex  larger  than  w . 

(ii)  P(u,w)  for  u < w or  w>v  is  a non-redundant  path  expression 
representing  exactly  the  non-empty  paths  from  u to  w all  of 
whose  intermediate  vertices  axe  smaller  than  min{u,v+l}  . 


Proof.  Straightforward  by  induction  on  v . □ 


11 


Theorem  5.  After  execution  of  ELIMINATE  the  following  statements  are 
true. 

(i)  P(u,  w)  for  u > w is  a non-redundant  path  expression  representing 

exactly  the  paths  frcm  u to  w which  contain  no  intermediate 
vertex  larger  than  w . 

(ii)  P(u, w)  for  u < w is  a non-redundant  path  expression  representing 

exactly  the  paths  from  u to  w all  of  whose  intermediate  vertices 
are  smaller  than  u . 

Theorem  b.  Let  P(u,w)  for  u,weV  be  the  path  expressions  computed 
by  ELIMINATE.  Then  the  following  sequence  is  a path  sequence:  the 
elements  of  {(p(u,  w),u,w)  | P(u,w)  / {0,A]  and  u < w}  in  increasing  order 
on  u , followed  by  the  elements  of  {(P(u, w),u,w)  | p(u, w)  ^ f)  and  u > w} 
in  decreasing  order  on  u . 

Proof.  The  sequence  specified  in  the  theorem  certainly  satisfies  (6a) 
and  (6b).  To  prove  (6c),  let  p be  any  non-empty  path  in  G . Let  vQ 
be  the  maximum  vertex  on  p . Let  p^  be  the  part  of  p from  the  first 

occurrence  of  v^  to  the  last  occurrence  of  Vq  (if  Vq  only  occurs  once, 

pQ  = A ).  For  i > 1 , let  v^^  be  the  largest  vertex  occurring  on  p 
after  the  last  occurrence  of  v^_^  , and  let  p^  be  the  part  of  p 
from  the  last  occurrence  of  v^_^  to  the  last  occurrence  of  v^  . 

Let  v j be  the  last  such  v^  defined  (v^  = t(p))  . For  i > 1 , 
let  v ^ be  the  largest  vertex  occurr  '.ng  on  p before  the  first 
occurence  of  v i+1  . Let  p be  the  Part  of  p fram  the  last 

occurrence  of  v_i  before  P_2i+2  to  the  beginning  of  P_2i+2  t 
and  let  p ^ be  the  part  of  p fran  the  first  occurrence  of  v_^ 


to  the  beginning  of  p 2±+±  * v ^ le  the  last 

such  v ^ defined  (v  = h(p))  . Then 

P = P-2k'P-2kfl'*"'P-l'P0'Pl  'Pl  With  p-2ie°(P(v-i'V-i>>  f°r 
0 < i < k , p_2i+1e  a(p(v_i,v_i+1))  for  1 < i < k , and 

p.^  e cr(p(vi  2.,vi^  for  1 < * < * • Ignoring  empty  paths  p,^  , we  get 
a partition  of  p which  satisfies  (6b).  It  is  straightforward  but 
tedious  to  show  that  this  partition  is  unique.  □ 

ELIMINATE  thus  gives  us  a way  to  construct  path  sequences.  The  resource 
requirements  of  the  method  depend  in  a complicated  way  upon  the  sparsity 
of  G . By  rearranging  the  computation  in  the  loop  of  ELIMINATE  and 
using  appropriate  data  structures  we  can  implement  ELIMINATE  to  run  in 


{ ■ • A 


| {P(u,v)  t p I u > v}|*  |{P(v,w)  t P I w > v}|  time  and  0 (f) 


storage  space,  where  l is  the  length  of  the  computed  path  sequence 
[ 5 >28].  (By  only  storing  P(u,  w)  for  pairs  u,  w such  that  eventually 

p 

p(u,w)  jt  p , we  can  avoid  spending  0(n  ) time  in  initialization.) 

For  dense  graphs  the  time  bound  is  0(n^  + m)  and  the  space  bound 

O 

is  0(n  ) . For  sparse  graphs,  the  resource  requirements  depend  upon 
the  vertex  numbering  chosen.  Numerical  analysts  have  devoted  much 
effort  to  finding  good  numbering  schemes,  both  for  arbitrary  sparse 
graphs  and  for  graphs  with  special  structure  [5,8,22,28]. 

All  their  techniques  except  off-diagonal  pivoting  [ll]  apply  to  the 
computation  of  path  sequences. 

In  order  to  improve  the  efficiency  of  this  method,  we  shall  combine 
it  with  two  decomposition  techniques.  The  idea  is  to  break  the  problem 


* r-:X^^ 


r 


graph  into  subgraphs,  apply  ELIMINATE  to  construct  a path  sequence 

[ 

for  each  subgraph,  and  combine  these  path  sequences  into  a 
path  sequence  for  the  original  graph.  Our  first  decomposition  technique 
I is  well-known  to  numerical  analysts  and  uses  the  strong  components  of  G . 

r 

L Theorem  5.  Suppose  G = (V,E)  is  acyclic  (i.e.,  each  strong  component 

is  a single  vertex)  and  that  the  vertices  of  G are  numbered  in  topological 
order.  Then  the  elements  of  {(e,h(e),t(e))  | e e E}  in  increasing  order 
on  h(e)  comprise  a path  sequence. 

Proof.  Immediate.  □ 

By  Theorem  5>  any  acyclic  graph  has  a path  sequence  of  length  m , 
which  can  be  found  in  0(n+m)  time  using  a linear-time  topological 
sorting  procedure  [ 19,  25 ] . 

Theorem  6.  Suppose  G = (V,E)  is  a directed  graph  with  strong 

. . . , G^  ) ordered  so  that  no  edge  leads  from  a component 

G.  to  a component  G.  with  j < i . For  1 < i < k , let  X.  be  a 
1 J 1 

path  sequence  for  G^  , and  let  be  a sequence  consisting  of  the 

elements  of  {(e,h(e),t(e))  [ h(e)  eG.  and  t(e)  / G^}  ordered  arbitrarily. 
(Note  that  Yfc  is  empty.)  Then  X±,  Y^,  Xg,  Yg, . . . , X^,  Y^,  XR  is  a 
path  sequence  for  G . 

Proof.  Immediate.  □ 

Theorem  6 generalizes  the  method  of  Theorem  5 to  arbitrary  directed 
graphs.  We  can  find  the  strong  components  of  a directed  graph  in  0(n+m) 
time  using  the  algorithm  of  Tarjan  [2k  ] . Thus  Theorem  6 gives  a method 


components  G^, 


1^ 


for  finding  a path  sequence  in  0(n+m)  time  plus  the  time  to  find 
path  sequences  for  the  strong  components.  The  length  of  the  sequence 
is  0(m)  plus  the  total  length  of  the  strong  components'  sequences. 


Path  Expressions  for  Reducible  Flow  Graphs. 

Although  decomposition  using  strong  components  is  efficient  and 
useful  in  practice,  many  problem  graphs  have  one  or  only  a few  strong 
components,  in  the  remaining  sections  of  this  paper  we  develop  a more 
powerful  decomposition  technique  based  upon  dominators.  We  begin  by 
considering  reducible  flow  graphs.  A flow  graph  G = (V,E,r)  is  a 
directed  graph  with  a distinguished  start  vertex  r such  that  every 
vertex  in  G is  reachable  from  r . By  Theorem  6 we  need  only  consider 
strongly  connected  graphs,  so  this  reachability  condition  is  no  restriction. 

A reducible  flow  graph  G = (V, E,r)  is  a flow  graph  that  can  be 
reduced  to  the  graph  consisting  of  the  single  vertex  r and  no  edges 
by  means  of  the  following  transformations: 

(remove  a loop):  If  e is  an  edge  such  that  h(e)  = t(e)  , delete 
edge  e . 

(remove  a vertex) ; If  w ^ r is  a vertex  such  that  all  edges  e 
with  t(e)  = w have  h(e)  = v for  some  vertex  v , contract  w 
into  v by  deleting  w and  all  edges  entering  w , and  converting 
any  edge  e with  h(e)  = w into  an  edge  e'  with  h(e')  = v 
and  t(e' ) = t(e)  . 

This  definition  is  due  to  Hecht  and  Ullman  [ lU ] ; there  are  many  other 
equivalent  definitions  of  reducible  flow  graphs  [12,  ll+,  15,26],  Intuitively 
a flow  graph  is  reducible  if  every  cycle  has  a single  entry  from  the 
start  vertex.  These  graphs  play  an  important  role  in  global  flow  analysis, 
because  the  control  flow  of  a reasonably  well- structured  program  can  be 
modelled  by  a reducible  flow  graph  [5,20]. 
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t 

As  the  reduction  by  T1  and  T2  takes  place,  each  vertex  in  the 
reduced  graph  represents  a subgraph  of  the  original  graph,  called  a 
region,  and  each  edge  in  the  reduced  graph  represents  an  edge  in  the 
original  graph.  We  define  this  notion  formally  as  follows. 

(7a)  Each  vertex  and  edge  in  the  original  graph  represents  itself. 

(7b)  If  is  applied  to  delete  an  edge  e , then  vertex  h(e)  = t(e) 

in  the  reduced  graph  represents  the  union  of  what  h(e)  and  e 
represent. 

(7c)  If  T2  is  applied  to  contract  vertex  w into  vertex  v , then 

v in  the  reduced  graph  represents  the  union  of  what  v , w , 
and  all  the  deleted  edges  e with  h(e)  = v , t(e)  = w 
represent.  Any  new  edge  e'  represents  what  the  corresponding 
old  edge  e represents. 

It  is  not  hard  to  show  that  each  region  is  indeed  a subgraph  of  G 
and  that  the  regions  corresponding  to  the  vertices  of  any  reduced  graph 
are  vertex-disjoint  (31] . Furthermore  every  region  I has  a unique 
header  vertex  v such  that  any  edge  e with  h(e)^I  , t(e)el  has 
t(e)  = v [31].  The  header  is  the  unique  vertex  in  the  region  which  has 
not  yet  been  contracted  into  another  vertex.  When  the  reduction  is 
complete,  r represents  a region  comprising  the  entire  graph  G . 

If  a flow  graph  is  reducible,  there  is  a reduction  order  v, , v^,  ...,v  ,,v  =r 

1 2 n-17  n 

of  the  vertices  such  that  the  graph  can  be  reduced  to  r in  the  following 
way  [26] • For  i from  1 to  n-1  , we  apply  to  delete  all  loops 

at  v.  ; then  we  apply  T to  contract  v.  into  another  vertex  v.  with 

— 2 3.  j 
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j > i . After  deleting  all  vertices  except  vn  = r , we  apply  to 

delete  all  loops  at  r . This  way  of  carrying  out  the  reduction  has  the 
following  property.  If  we  regard  the  repeated  application  of  at  a 

vertex  v^  followed  by  the  application  of  Tg  to  delete  v^  as  a single 
step,  then  between  any  two  steps  the  entry  vertex  of  any  region  has  no 
edges  entering  it  from  within  the  region. 

We  shall  assume  henceforth  that  the  vertices  of  G are  numbered 
from  1 to  n in  a reduction  order  and  identified  by  number.  We  shall 
also  assume  that  header (v)  for  v ^ r is  the  vertex  into  which  v is 
eventually  contracted,  that  cycle (v)  for  any  vertex  v is  the  set  of 
edges  in  G represented  by  edges  deleted  when  applying  T.  to  delete  loops 
at  v , and  that  noncycle (v)  for  v ^ r is  the  set  of  edges  in  G 
represented  by  edges  deleted  when  applying  to  delete  v . The  following 

lemma  states  some  basic  properties  of  header  , cycle  , and  noncycle. 

Lemma  Suppose  G is  a reducible  flow  graph  whose  vertices  are 

numbered  in  a reduction  order.  Let  v be  any  vertex  and  let  e be 
any  edge.  Then 

(i)  if  v ft  r , header (v)  > v ; 

(ii)  either  h(e)  = header (t (e) ) or  h(e)  < t(e)  ; 

(iii)  if  e e cycle(t(e) ) then  headeri(h(e) ) = t(e)  for  some  i > 0 ; and 

(iv)  if  e e noncycle (t (e ) ) then  header* (h(e) ) / t’(e)  for  all  i > 0 
but  header1 (h(e))  = header (t(e))  for  some  i > 0 . 

Proof.  Straightforward.  □ 
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The  algorithm  of  Tar j an  [26]  computes  a reduction  order  and 
associated  arrays  header  , cycle  , and  noncycle  in  0(ma:(m,n)) 
time.  Using  this  information  we  can  solve  the  single-source  path 
expression  problem  whose  source  vertex  is  r . The  algorithm 
resembles  the  methods  of  Ullman  [31]  and  Graham  and  Wegman  [12]  for 
solving  "forward"  data  flow  problems;  we  discuss  this  resemblance  at 
the  end  of  the  section. 

The  algorithm  computes  path  expressions  as  the  reduction  proceeds, 
using  a data  structure  representing  the  current  regions.  The  data 
structure  consists  of  a forest  whose  vertices  are  the  vertices  of  G 
and  whose  edges  are  the  pairs  (header(v),  v)  such  that  v has  been 
contracted  into  header (v)  . Thus  this  header  forest  consists  of  one 
tree  per  region;  the  tree  representing  a region  contains  exactly  the 
vertices  in  the  region  and  has  the  header  of  the  region  as  its  root. 
With  every  vertex  v in  the  forest  is  associated  a non-redundant  path 
expression  R(v)  . The  algorithm  manipulates  the  forest  by  means  of 
four  operations: 


IWITIALIZfi(v) : Form  a tree  with  one  vertex  v and  associated  path 

expression  R(v)  :=  A . 


UPDATE (v, R) : If  v is  a root,  assign  R(v)  :=  R . 

LINK(v, w):  If  v and  w are  roots,  combine  the  trees  with 

roots  v and  w by  making  v the  parent  of  w . 


EVAL(v):  If  r = vQ  v^  - v^  - . . . - vk  = v is  the  tree 

path  from  the  root  r of  the  tree  containing  v 
to  v , return  a non-redundant  path  expression 
equivalent  to  R(vq)  • R^) R(vfc)  * 


4 
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The  algorithm  maintains  the  following  invariant:  If  I is  a region  and 
v is  a vertex  in  I , then  EVAL(v)  represents  exactly  all  paths  in  I 
from  the  header  of  I to  v . 


finalize:  P(r,r)  :=  fi; 

for  each  e e cycle(r)  do  P(r,r)  :=  [P(r,r) U [EVAL(h(e)).e]]  od; 
P(r,r)  :=  [P(r,r)*]; 

for  v :=  1 until  n-1  do  P(r,v)  :=  [P(r,r)*EVAL(v)]  od 
end  REDUCE; 


Lemma  APter  the  v-th  iteration  of  the  loop  in  REDUCE,  EVAL(u) 

for  any  vertex  u represents  exactly  all  paths  in  the  current  region  I 
containing  u from  the  header  of  I to  u . 

Proof.  By  induction  on  v . The  lemma  is  certainly  true  before  the 
first  iteration  of  the  loop.  Suppose  the  lemma  is  true  before  the  v-th 
iteration  of  the  loop.  Let  I be  the  current  region  containing  v and 
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let  I be  the  current  region  containing  header  (v)  . Let  1^  be  the 
region  containing  v after  is  applied  to  eliminate  all  loops  at  v . 

Let  1^  be  the  region  containing  v after  is  applied  to  contract 

v into  header (v)  ; i.e.,  after  the  v-th  iteration  of  the  loop. 

I consists  of  1^  and  the  edges  in  cycle (v)  . 1^  consists  of 

Ig  , I , and  the  edges  in  non cycle (v)  ; the  header  of  1^  is  the 
header  of  I . 

1^  contains  no  edges  entering  v . It  follows  from  the  induction 

hypothesis  that  the  value  of  Q after  the  v-th  iteration  is  a non-redundant 

path  expression  representing  all  paths  from  v to  v in  I which  do  not 

* 

contain  v as  an  intermediate  vertex.  Thus  Q represents  all  paths  in 
I from  v to  v . It  also  follows  from  the  induction  hypothesis  that 
the  value  of  P after  the  v-th  iteration  is  a non-redundant  path  expression 
representing  all  paths  in  1^  from  the  header  of  1^  to  v which  do  not 
contain  v as  an  intermediate  vertex. 

If  u is  a vertex  in  1^  , then  the  paths  in  1^  from  the  header 

of  1^  to  u are  exactly  the  paths  in  1^  from  the  header  of  1^ 

to  u . If  u is  a vertex  in  Ig  , the  paths  in  1^  from  the  header 

of  1^  to  u are  exactly  the  paths  p partitionable  into 
p = p1,p2,Pj  , where  p1 e a(p)  , pg  e a(Q  ) , and  p^  is  a path  in 

I from  the  header  of  I]L  to  u . Thus  adding  edge  (header (v),v) 

to  the  forest  and  replacing  the  old  value  (a)  of  P(v)  by  [P*[Q  ]] 
guarantees  that  the  lemma  holds  after  the  v-th  iteration  of  the  loop.  □ 
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Corollary  1.  After  execution  of  REDUCE,  R(v)  for  any  vertex  v / r 


is  a non- redundant  path  expression  representing  exactly  the  set  of 
paths  from  header (v)  to  v al 1 of  whose  intermediate  vertices  are 
smaller  than  header (v)  . 

Proof.  For  any  vertex  v ^ r , let  1^  he  the  region  containing  v 
after  the  v-th  iteration  of  the  loop  in  REDUCE.  Let  R(v)  be  the  path 
expression  computed  for  v during  this  iteration.  By  Lemma  4, 

R(v)  is  a non-redundant  path  expression  representing  all  paths  in 
1^  from  header (v)  to  v . Any  path  in  G from  header (v)  to  v 
which  leaves  1^  must  contain  header (v)  twice,  since  the  only  way 
to  enter  1^  is  through  header(v)  . □ 

Theorem  7.  Let  v any  vertex.  After  execution  of  REDUCE,  P(r,  v) 

is  a non-redundant  path  expression  representing  all  paths  from  r to  v . 

Proof.  Lemma  1+  holds  after  the  last  iteration  of  the  loop  in  REDUCE. 

A proof  similar  to  that  of  Lemma  4 shows  that  P(r, r)  as  computed  in 
the  final  part  of  REDUCE  is  a non-redundant  path  expression  representing 
all  paths  from  r to  r in  G . It  follows  from  Lemma  1+  that  the 
computed  value  of  P(r, r)  for  v ^ r is  a non-redundant  path  expression 
representing  all  paths  from  r to  v in  G . □ 

Procedure  REDUCE  requires  0(n+ra)  time  plus  time  for  n calls 
on  INITIALIZE,  n-1  calls  on  UPDATE,  n-1  calls  on  LINK,  and  m+n-1 
calls  on  EVAL;  thus  the  forest  manipulation  operations  dominate  the 
running  time  or  the  algorithm.  Tar j an  [29l  describes  two  ways  to 
implement  the  forest  operations.  The  first  is  a simple  method 


called,  path  compression  which  requires  0(m  log  n)  time.  The  second. 


is  a sophisticated  off-line  method  which  by  preprocessing  the  entire 
sequence  of  EVA L and  LINK  operations  is  able  to  perform  all  the  forest 
manipulation  in  0(m  cn(m, n))  time,  (it  ' ■ easy  to  precompute  the 
sequence  of  EVAL  and  LINK  operations  performed  by  REDUCE.)  Farrow  [9] 
presents  another  0(m  a(m,  n))  -time  method  called  stratified  path 
compression.  This  method  has  the  advantage  of  being  on-line,  although 
the  proof  of  its  time  bound  is  very  complicated. 

By  using  either  of  the  0(ma(m, n))  -time  algorithms  for  forest 
manipulation  we  obtain  a moderately  complicated  0(mCK(m,  n))  -time 
implementation  of  REDUCE.  By  using  path  compression  we  obtain  an 

4 

0(m  log  n)  -time  implementation  of  REDUCE  which  is  remarkably  simple 
and  efficient.  We  favor  the  latter  implementation  for  practical 
applications. 

Ullman’s  algorithm  for  forward  data  flow  analysis  [31]  is  essentially 
identical  to  REDUCE  except  that  it  uses  2-3  trees  to  carry  out  the  forest 
operations.  Its  time  bound  is  0(m  log  n)  but  it  is  more  complicated 
than  our  method  using  path  compression.  Graham  and  Wegman’s  algorithm  [12] 
is  a version  of  REDUCE  which  uses  no  auxiliary  data  structure  but  carries 
out  a form  of  path  compression  on  the  original  graph.  Its  time  bound 
is  0(m  log  n)  but  it  also  is  more  complicated  than  our  method  using 
path  compression.  Experimental  comparisons  between  these  methods  would 
be  valuable. 
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5 . Computing  Path  Sequences  for  Reducible  Flow  Graphs. 

Some  kinds  of  data  flow  analysis,  such  as  the  computation  of  live 
variables  [17],  require  that  information  be  propagated  backward  rather 
than  forward  through  the  control  flow  graph  of  the  program.  We  can 
carry  out  such  backward  data  flow  analysis  by  solving  a single-source 
path  problem  on  the  reverse  of  the  control  flow  graph.  Since  reducibility 
is  not  preserved  by  graph  reversal,  the  algorithm  of  Section  5 is 
inadequate  for  this  purpose.  In  this  section,  we  shall  modify  REDUCE 
so  that  it  computes  a path  sequence  for  any  reducible  flow  graph.  By 
using  such  a path  sequence  and  applying  Theorem  6 if  necessary,  we  can 
solve  single-  and  multi-source  path  problems  on  any  flow  graph  which  is 
reducible  or  whose  reverse  is  reducible.  This  provides  an  efficient  way 
to  do  backward  data  flow  analysis. 

In  order  to  develop  this  algorithm,  we  need  to  examine  the  implementation 
of  the  header  forest  operations.  We  shall  describe  a generic  implementation 
of  which  path  compression  [29]  and  stratified  path  compression  [ 9 ] 
are  special  cases.  We  shall  use  this  generic  implementation  in  an 
extension  of  REDUCE  which  computes  path  sequences. 

The  generic  implementation  uses  a compressed  forest  to  represent  the 
header  forest.  With  each  vertex  v^  of  the  compressed  forest  is 
associated  a path  expression  S(v)  . The  method  maintains  the  following 
invariants. 

(8a)  For  each  tree  T in  the  header  forest,  there  is  a corresponding 

tree  T of  the  compressed  forest  which  contains  the  same 
vertices  as  T . 
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(8b) 


0 

If  v -♦  w in  a tree  T of  the  compressed  forest,  then 
v -»  w in  T . In  particular,  corresponding  trees  T and  T 
have  the  same  root. 


(8c)  For  any  vertex  v , let  r = vQ  -•  v^  v^  = v be  the 

path  in  the  header  forest  from  a root  to  v , and  let 
r = WQ-»w^-*...  -«w  =v  be  the  path  in  the  compressed 
forest  from  a root  to  v . Then  R(vq)  • R(v^)  • • R(v^) 

and  S(wQ)  ■ S(w1)  • ...  • S(w^)  are  equivalent  non-redundant 
path  expressions. 

The  compressed  forest  is  represented  by  an  array  ancestor  such 
that  ancestor (v)  is  the  parent  of  v in  the  compressed  forest;  if 
ancestor(v)  = 0 then  v is  a root.  The  following  procedures  implement 
the  forest  operations. 

procedure  INITIALIZE(v) ; 


begin  ancestor(v)  :=  0;  S(v)  :=  A end; 


procedure  UPDATE(v, R) ; 


S(v)  :=  R; 


procedure  LINK(v,w); 

ancestor (w)  :=  v; 
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regular  expression  procedure  EVAL(v); 
tegin 

non-deterministically  execute  COMPRESS(u)  for  an 
arbitrary  sequence  of  vertices  u; 
let  vo,vi>  *",vk  be  such  tha/t  v = ancestor (v^)  = ^ for 

for  1 < i < k,  and  ancestor(vQ)  = 0; 

EVAL  :=  if  k = 0 - A 

□ MO-  s(V;L)  * S(v2)  • • S(vk)  fi 

end  EVAL; 

procedure  COMPRESS (u); 

if  ancestor ( ancestor (u) ) ^ 0 — 

S(u)  :=  S ( ancestor (u) ) • S(u); 
ancestor(u)  :=  ancestor ( ancestor (u) ) fi; 

It  is  evident  that  COMPRESS  preserves  (8a)- (8c);  thus  the  procedures 
above  are  a valid  implementation  of  the  header  forest  operations.  The 
following  lemma  is  easy  to  prove  using  the  results  in  Section  L. 

Lemma  5.  if  v is  any  vertex  such  that  ancestor (v)  ^ 0 , then  S(v) 
is  a non-redundant  path  expression  representing  exactly  the  set  of  paths 
frcm  ancestor(v)  to  v all  of  whose  intermediate  vertices  are  smaller 
than  ancestor(v)  . 

EVAL  is  a non-determini stic  procedure  which  is  free  to  choose  an 
arbitrary  sequence  of  vertices  u on  which  to  execute  COMPRESS (u)  . 

We  obtain  a specific  implementation  by  including  a mechanism  for  making 
this  choice.  Path  compression  uses  the  following  version  of  EVAL. 
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regular  expression  procedure  EVAL(v); 

i*y^»r^wv~vy\y>_)  rw>i->j->-TLr>  r>  ; 

if  ancestor (v)  = 0 -•  EVAL  :=  A 


[]  ancestor (v)  ^ 0 - PATH  COMPRESS (v);  EVAL  :=  S(v)  fi; 

procedure  PATH_COMPRESS(v) ; 

if  ancestor (ancestor (v) ) / 0 - 
PATH_COMPRESS(ancestor(v) ) ; 

S(v)  :=  S(ancestor(v) ) • S(v); 
ancestor (v)  :=  ancestor( ancestor (v) ) fi; 

Stratified  path  compression  uses  a more  complicated  compression  mechanism 
which  requires  the  maintenance  of  additional  data  structures  [ 9]  - 

The  following  version  of  REDUCE  uses  the  generic  implementation  of 
the  header  forest  operations  to  compute  a path  sequence.  Procedures 
EVAL  and  COMPRESS  are  modified  so  that  they  add  elements  to  the  path 
sequence  as  a side  effect. 


REDUCE  AND  SEQUENCE; 


initialize : 


loop: 


for  each  veV  do  INITIALIZE(v)  od; 
sequence  :=  the  empty  sequence; 
for  v :=  1 until  n-1  do 


P :=  0;  Q :=  p\ 

for  each  e g noncycle  (v)  do  P :=  [P  U EVAL_AND_SEQUENCE(e)  ] oci; 
for  each  eecycle(v)  do  Q :=  [QIJEVAL  AND  SEQUENCE(e)]  od; 

/X XX  ““ 

add  1 : if  T Q.  1 ± A - add  ([Q  ],v,v)  to  sequence  fi; 

■ — ■ ^*x  ' /x^/ 

UPDATE(v,  [P.[Q*]]); 


LINK(header(v),  v)  od; 
finalize:  Q :=  0; 

for  each  e e cycle (r)  do  Q :=  [QUEVAL  AND  SEQUENCE(e)]  od; 
add 2:  if  [Q  ] i A - add  ([Q  ],r,r)  to  sequence  fi; 

■ /XX  ' /XX 

for  v :=  n-1  by  -1  until  1 do  add  (S(v),ancestor(v),v)  to  sequence  od 

«x»xx  /xx  /xxxxx  <xx  /xx 

end  REDUCE  AND  SEQUENCE; 


regular  expression  procedure 

/XXxXXXx  /XXXXXXXXXX  /xxxxxxxxx 


EVAL  AND  SEQUENCE(e); 


begin 

non-deterministically  execute  COMPRESS_AND_SEQUENCE(u)  for 


an  arbitrary  sequence  of  vertices  u; 
let  v ,v  be  such  that  h(e)  = vfc,  ancestor^)  = for 

1 < i < k,  and  ancestor (v^)  = 0; 
if  k = 0 - EVAL  AND  SEQUENCE  :=  e 

/XX  ” “ 

Q k ^ 0 - EVAL_AND_SEQUENCE  :=  S(vk)*e; 
for  i :=  k-1  by  -1  until  1 do 

/XXX  /XX  /XXXXX  /xx 

add  (EVAL_AND_SEQUENCE,vi,t(e))  to  sequence; 

EVAL  AND_SEQUENCE  :=  S(vi)  • EVAL_AND_SEQUENCE  od  fi 
end  EVAL  AND  SEQUENCE; 
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procedure  COMFRESS_AND_SEQUENCE(u) ; 
if  ancestor (ancestor(u) ) ^ 0 -♦ 

add  (S(u), ancestor  (u),u)  to  sequence; 

S(u)  :=  S(ancestor(u))»S(u); 
ancestor(u)  :=  ancestor(ancestor(u) ) fi; 

Theorem  8.  The  sequence  computed  by  REDUCE_AND_SEQUENCE  is  a path 
sequence  for  G . 

Proof.  The  proof  is  similar  to  the  proof  of  Theorem  U but  a little  more 

complicated.  We  shall  assume  for  purposes  of  the  proof  that  statement 

* * 
add  1 always  adds  ([Q  ],v,v)  to  sequence  , whether  or  not  [Q  ] = A ; 

similarly  for  statement  add  2 . This  modification  does  not  affect  the 

properties  of  sequence  in  which  we  are  interested. 

Lemma  5 and  an  inspection  of  REDUCE_AND_SEQUENCE  show  that  the  computed 

sequence  satisfies  (6a)  and  (6b).  To  prove  (6c),  let  p be  an  arbitrary 

path  in  G . Let  v^  = h(p)  . For  i > 1 , let  be  the  first  vertex 

on  p such  that  v.  > v . Let  v,  be  the  last  vertex  so  defined 

1 1 “ -L  iv 

(vk  is  the  largest  vertex  on  p ).  Let  vk+1  = t(p)  . Let  pgk  be  the 

part  of  p from  the  first  occurrence  of  v,  to  the  last  occurrence  of  v,  . 

K K 

Let  P2k+p  be  Par"*;  P following  p2k  . For  0 < i < k-1  , let 

P2i+-^  be  the  part  of  p from  the  last  occurrence  of  v ^ before  P2^+2 

to  the  beginning  of  P2i+2  • Let  p^^  be  the  part  of  p from  the  first 
occurrence  of  vi  to  the  beginning  of  P2i+1  . Then  p = p^p^  . ••>P2k+1  > 
where  p2^  for  0 < i < k is  a path  from  v^  to  v^  containing  no 

vertex  greater  than  v^^  , and  P2i+1  for  0 < i < k isa  path  from  v^^ 

to  v.  . aid  of  whose  intermediate  vertices  are  less  than  v.  . 
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•X- 

For  0 < i < k , p2i  e cj(q  (vjL))  , where  Q(vi)  for  v±  ^ r 

is  the  value  of  Q computed,  during  the  v^  -th  iteration  of  the  loop 

in  REDUCE_AND_SEQUENCE,  and  Q(r)  is  the  value  of  Q computed  during 

the  final  part  of  REDUC E_AND_SEQUEN C E . In  order  to  represent  p as 

in  (6c),  it  remains  for  us  to  (i)  partition  each  path  p2i+^  for 

0 < i < k-1  into  a sequence  of  paths  represented  by  triples  appearing 

* * 

in  sequence  between  ([Q(v^)  ],v^,v^)  and  ([Q(v.j^)  ],Vi+l,Vi+l)  ' 
and  (ii)  partition  p . into  a sequence  of  paths  represented  by 

ciKT  J- 

triples  appearing  in  sequence  after  ([Q(vk)  ]>vk^vk)  • 

Consider  any  path  p2i+^  for  0 < i < k-1  . Let  e.^  be  the  last 
edge  on  this  path.  Then  t(e^)  = v^+^  , and  h(e^)  is  a descendant 
of  v^  in  the  compressed  tree  just  after  tie  v^  -th  iteration  of  the 
loop  in  REDUCE_AND_SEQUENCE.  We  partition  P2i+1  into 


2i+l  “ *2i+l, 0^21+1, 1 


jP04xi  t ^ • • • r Poj 1 1 . as  follows.  Let  j = 0 and 


2i+l, t 


p^|^  = P2i+1  • Repeat  the  following  step  until  it  no  longer  applies. 


Suppose  h(e^)  is  not  a descendant  of  h(P2^|^)  in 
the  compressed  tree  when  edge  e^  is  processed  by  REDUCE. 
Consider  the  moment  when  h(e^)  becomes  a non-descendant 
of  h(P2^3_)  • This  event  must  be  caused  by  an  execution 
of  COMPRESS (u)  such  that  ancestor (u)  = h(P2^+^)  . 

Let  p0.  . . be  the  part  of  p^f2  from  the  beginning 

to  P^i+i  to  the  last  occurrence  of  u . Partition 

into  4iii‘ p2H-i,3'piiii>  ‘”areIiaoe  •> 

hy  J+i  . 
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must 


Consider  a single  execution  of  the  general  step.  Path  Ppi+l 

/(i)\*  * / \ 

contain  u since  - u - h(ei)  in  the  header  tree.  Thus 

can  be  partitioned  as  stated.  Execution  of  COMPRESS(u)  causes 

(S(u)  , h(p^]1)  , u)  to  be  added  to  sequence  ; p2i+1  e c(s(u))  . 

After  execution  of  COMPRESS,  h(e^)  is  a descendant  of  u = 
in  the  compressed  tree. 

( e) 

Suppose  the  general  step  is  executed  l times.  Let  P2p+p  t = P2i+1  ' 
By  the  discussion  above,  there  is  a subsequence  of  triples 
(P0,u0,wQ)  , (P^u^*^)  , ...  , (pf.1>ut.1>wf_1)  appearing  in  sequence  after 

-X- 

([Q^)  ],v  ,v  ) and  before  triples  of  the  form  (P,u,vi+1)  , and  such  that 

p eP.  for  0 < j < j? -1  . Furthermore  h(e.)  is  a descendant 

2i+l,j  J - “ i 

of  h(p  . . ) in  the  compressed  tree  just  after  all  compression  is 

C.  i ' 1.)  I 

finished  during  the  execution  of  EV AL_AN D_S EQUENC E ( e ^ ) , The  operation 
of  EVAL_AND_SEQUENCE(ei)  adds  a triple  (P^  , h(p2i+^  , v^+^)  such 

that  P2^+p  i e CT(Pj)  sequence  . Thus  we  obtain  a satisfactory 

partition  of  p2i+^  • 

The  partitioning  of  P2k+1  i*  the  same  as  the  partitioning  of 
p..^,  for  1 < i < k-1  except  that  the  path  p0. , must  be  further 

dl+-L  — — dl+-Lj  l 

partitioned  into  paths  represented  by  triples  (S(v), ancestor (v),v) 

added  to  sequence  during  the  final  part  of  REDUCE_AKD_SEQUENCE. 

The  details  are  straightforward. 

We  obtain  by  the  method  above  a partition  of  an  arbitrary  path  p 
which  satisfies  (6c)  if  we  ignore  empty  paths  in  the  partition. 

Showing  that  the  partition  is  unique  is  tedious  but  not  difficult. 

The  crucial  point  is  that  for  any  pair  u > v , only  one  triple  of 
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the  form  (P,  u,v)  appears  in  sequence  . We  leave  the  details  to  the 
reader.  □ 

REDUCE_AND_SEQUENCE  requires  0(m  log  n)  time  to  construct  a path 
sequence  if  path  compression  is  used  to  implement  the  forest  operations 
and  0(m  a(m,n))  if  stratified  path  compression  is  used.  The  length  of 
the  path  sequence  constructed  is  proportional  to  the  running  time.  It 
is  interesting  to  note  that  the  version  of  the  algorithm  which  carries 
out  no  compression  generates  essentially  the  same  path  sequence  as 


ELIMINATE. 


6.  Decomposition  Using  Dominators. 


In  this  section  we  generalize  the  algorithm  of  Section  5 so  that 
it  becomes  a decomposition  method  applicable  to  n.1 1 graphs.  The 
reducible  graphs  play  a role  in  this  method  analogous  to  the  role  of 
acyclic  graphs  in  decomposition  by  strong  components.  Just  as  a graph 
is  acyclic  if  and  only  if  all  its  strong  components  are  single  vertices, 
a graph  is  reducible  if  and  only  if  all  its  components  in  the  new 
decomposition  are  single  vertices. 

The  concept  we  use  is  that  of  a single-entry  region,  which  we  make 
precise  as  follows.  For  an  arbitrary  flow  graph  G = (V, E, r)  , we  say 
a vertex  v dominates  another  vertex  w if  v ^ w and  v lies  on 
every  path  from  r to  w . 

Lemma  6 [ 1 ] . There  is  a tree  T , called  the  dominator  tree  of  G , 

such  that  v is  a proper  ancestor  of  w in  T if  and  only  if  v 
dominates  w . Vertex  r is  the  root  of  T and  D contains  every 
vertex  in  G . 

For  any  vertex  v ^ r , we  denote  by  idcm(v)  the  parent  of  v 
in  T . Vertex  idom(v)  is  called  the  immediate  dominator  of  v and 
is  the  unique  vertex  which  dominates  v and  is  dominated  by  every  other 
dominator  of  v . The  dominator  tree  defines  the  single-entry  regions 
of  G ; the  following  lemma  is  a technical  statement  of  this  fact. 

(Note  the  similarity  between  this  lemma  and  Lemma  3«) 

Lemma  7»  For  any  edge  e , idom(t(e))  is  an  ancestor  of  h(e)  in  T 

Proof.  Every  path  from  r to  t(e)  contains  idcm(t(e) ) . By  adding 
edge  e to  any  path  from-  r to  h(e)  , we  get  a path  from  r to  t(e) 


Thus  any  path  from  r to  h(e)  contains  idom(t(e) ) , and  by  Lemma  6 
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idom(t(e) ) -*  h(e)  in  T . □ 

For  any  edge  e , let  e be  an  edge  such  that  t(e)  = t(e)  and 
h(e)  = h(e)  if  h(e)  = idom(t(e) ) , h(e)  = u where 
idom(t(e) ) - u -*  h(e)  in  T if  t(e)  ^ idom(h(e) ) . Let 
G = (V,E,  r)  , where  E = {e  | e e E}  . We  call  G the  derived  graph 
of  G . Figures  1-3  illustrate  a graph,  its  dominator  tree,  and  its 
derived  graph.  Note  that  there  are  three  kinds  of  edges  in  the  derived 
graph.  If  t(e)  = idom(h(e))  , then  e = e is  an  edge  in  T . If 
t(e)  h(e)  in  T then  e is  a loop.  Otherwise  e leads  from  one 
sibling  to  another  in  T . 

[Figure  1] 

[Figure  2] 

[Figure  3] 

We  call  the  strong  components  of  G the  dominator  strong  components 
of  G . It  is  not  hard  to  prove  that  a graph  is  reducible  if  and  only  if 
all  its  dominator  strong  components  are  single  vertices.  The  idea  of 
our  algorithm  is  to  use  Gaussian  elimination  (or  some  other  method)  to 
compute  a path  sequence  for  each  dominator  strong  component  of  G , and 
to  combine  these  path  sequences  to  form  a path  sequence  for  G by  using 
a combination  of  the  methods  in  Sections  3 and  5.  The  algorithm 
manipulates  the  dominator  tree  in  the  same  way  that  REDUCE_AND_SEQUENCE 
manipulates  the  tree  defined  by  the  header  pointers.  Henceforth  when 
we  refer  to  descendants  and  ancestors  we  mean  with  respect  to  the 


dominator  tree  T 


The  algorithm  assumes  that  the  dominator  tree  of  G is  known  and 


that  the  vertices  are  numbered  from  1 to  n so  that  idom(v)  > v 
for  each  vertex  v ^ r . The  algorithm  requires  the  following  information 
for  each  vertex  u the  set  children (u)  of  vertices  v such  that 
idom(v)  = u , the  set  tree(u)  of  edges  e such  that  t(e)  = u and 
h(e)  = idom(u)  , and  the  set  nontree (u)  of  edges  e such  that 
t(e)  = u and  h(e)  / idom(u)  ; for  each  edge  e the  corresponding 
edge  e in  G . This  information  and  the  vertex  numbering  can  be 
computed  in  0(m  a(m, n))  time  using  the  daminators  algorithm  of 
Lengauer  and  Tarjan  [21], 

The  algorithm  groups  together  vertices  with  a common  parent  and 
processes  these  sibling  sets  in  increasing  order  by  parent.  The  algorithm 
processes  the  set  of  siblings  children (u)  for  each  vertex  u as 
follows.  For  each  edge  e such  that  h(e)  is  a child  of  u , the 
algorithm  uses  EVAL_AND_SEQUENCE  to  compute  a path  expression  p(e) 
representing  all  paths  in  G fran  h(e)  to  t(e)  which  end  with 
edge  e and  contain  only  proper  descendants  of  h(e)  as  intermediate 
vertices.  Then  the  algorithm  computes  a path  sequence  3^  for  the 
subgraph  Gu  of  G induced  by  children (u)  . Substituting  P(e)  for 
for  each  edge  e appearing  in  this  path  sequence  produces  a sequence 
that  represents  every  path  in  G starting  and  ending  at  a child 
of  u and  containing  only  proper  descendants  of  u as  intermediate 
vertices. 

The  algorithm  concatenates  Yu  onto  the  end  of  the  path  sequence. 

By  applying  SOLVE  to  Y^  , the  algorithm  computes  for  each  child  v 
of  u a path  expression  R(v)  which  represents  all  paths  in  G from 
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u to  v containing  only  proper  descendants  of  u as  intermediate 
vertices.  The  algorithm  completes  the  processing  of  the  sibling  set 
by  executing  UPDATE(v, R(v) ) ; LINK(u,v)  for  each  child  v of  u . 

The  algorithm  finishes  by  computing  a path  expression  Q representing 
all  paths  from  r to  r and  adding  additional  triples  to  the  path 
sequence  just  REDUCE_AND_SEQUENCE  does.  The  algorithm  appears  in  more 
detail  below. 


procedure  DECOMPOSE_AND_SEQUENCE; 
begin 

/vs/  /w 

initialize:  for  each  veV  do^  INITIALIZE(v)  od; 

sequence  = the  empty  sequence; 


loop: 
derive ; 

eliminate : 
substitute : 


for  u :=  1 until  n do 

for  each  v e children(u)  do 

for  each  e e nontree (v)  do 

P(e)  :=  EVAL  AND  SEQUENCE (e)  od  od; 

/■w/  /-w/ 

compute  a path  sequence  for  Gu; 
form  Yu  from  by  replacing  each  occurrence  of  an 
edge  e in  a path  expression  by  P(e); 


solve ; 


update ; 


sequence  :=  sequence  concatenated  with  Yu; 
for  each  vg  children (u)  do  R(v)  := 

for  each  e e tree(v)  do  R(v)  :=  [R(v)|je]  od  od; 
for  each  (P,  w,  x)  c Y in  order  do 
if  w = x - R(w)  :=  [R(w)>p] 

] w / x - R(x)  :=  [R(x) u [R(w).P] ] fi  od; 
for  each  v e children(u)  do 

UPDATE(v, R(v));  LINK(u,v)  od  od; 
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finalize: 


This  method  combines  the  techniques  of  Section  3 with  the  method 
of  Section  5.  The  parts  of  the  program  labelled  initialize  , derive  , 
update  , and  finalize  are  adapted  from  REDUCE_AND_SEQUENCE  and  serve 
to  combine  the  path  sequences  computed  for  the  dominator  strong  components 
(in  eliminate  and  substitute  ) into  a path  sequence  for  the  entire 
graph.  The  two  loops  labelled  solve  comprise  a version  of  SOLVE. 

We  can  implement  step  eliminate  using  ELIMINATE  on  the  strong 
components  of  and  combining  the  results  as  described  in  Theorem  6. 

Step  substitute  can  be  performed  either  after  or  during  the  computation 
of  * the  latter  is  preferable. 

The  next  lemma  expresses  the  properties  of  the  values  computed  by 
DECOMPOSE_AND_ELIMINATE;  its  proof  combines  the  ideas  in  Theorem  1 and 
Corollary  1. 

Lemma  8 . (i)  For  each  edge  e in  G such  that  e e nontree (t(e) ) , 

P(e)  as  computed  by  DECOMPOSE_AND_SEQUENCE  is  a non-redundant  path 
expression  representing  exactly  the  paths  in  G from  h(e)  to  t(e) 
which  end  with  edge  e and  contain  only  proper  descendants  of  h(e) 
as  intermediate  vertices. 

(ii)  For  each  vertex  v in  G , R(v)  as  computed  by  DECOMPOSE  AND_SEQUENCE 
is  a non-redundant  path  expression  representing  exactly  the  paths  in  G 
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from  idom(v)  to  v which  contain  only  proper  descendants  of  idom(v) 


as  intermediate  vertices. 

(iii)  For  each  vertex  u in  G , Y as  computed  by  DECOMPOSE_AND_SEQlTENCE 
is  a sequence  Yu  = (P^  v^  w.^,  (P2>v2,  w2), . . (P^v^,  w^)  satisfying 
(6a),  (6b),  and 

(9)  For  any  non-empty  path  p in  G which  starts  and  ends  at  a child 
of  u and  contains  only  proper  descendants  of  u as  intermediate  vertices, 
there  is  a unique  sequence  of  indices  1 < i^  < i^  < . . . < i^  < l and 
a unique  partition  of  p into  non-empty  paths  p = P-^,Pg,  • • *,1^  such 
that  p . e o(p.  ) for  1 < i < k . 

proof.  Straightforward  by  induction  on  the  number  of  times  the  loop 
in  DECOMPOSE_ANI>_SEQUENCE  is  executed.  □ 

Theorem  9.  Procedure  DECOMPOSE_AND_SEQUENCE  correctly  computes  a path 
sequence  for  G . 

proof.  Analogous  to  the  proof  of  Theorem  8.  □ 

DECOMPOSE_AND_ELIMINATE  thus  provides  a way  to  compute  path  sequences 
in  arbitrary  graphs.  The  running  time  of  the  method  is  0(m  Of(m,n) + t) 
if  stratified  path  compression  is  used  to  implement  the  forest  operations 
and  0((m  log  n) +t)  if  path  compression  is  used,  where  t is  the  time 
to  find  path  sequences  for  the  dominator  strong  components  of  G . The 
length  of  the  path  sequence  produced  is  either  0(m  a(m, n)) + / or 
0(m  log  n) + l , where  l is  the  total  length  of  the  path  sequences  for 
the  dominator  strong  components. 


7.  Remarks . 


In  this  paper  we  have  described  fast  algorithms  for  solving  path 
expression  problems  on  reducible  or  almost-reducible  graphs.  The  fastest 
method  requires  0(m  a(m,n) + t)  time  to  compute  a path  sequence  for  an 
arbitrary  directed  graph,  where  t is  the  amount  of  time  required  to 
compute  path  sequences  for  the  dominator  strong  components.  A slower 
but  much  simpler  method  requires  0(m  log  n + t)  time  and  promises  to 
be  easy  to  program  and  efficient  in  practice. 

By  using  our  algorithms  in  combination  with  the  mapping  technique 
described  by  Tarjan  [30 ],  we  can  solve  many  kinds  of  path  problems, 
including  finding  shortest  paths,  carrying  out  forward  and  backward 
global  flow  analysis,  and  solving  sparse  systems  of  linear  equations. 

There  are  two  rather  different  ways  of  doing  this.  The  first  is  to 
use  the  solution  to  a path  expression  problem  as  a general-purpose 
straight-line  program  which  solves  any  particular  path  problem  by 
properly  interpreting  u > ‘ > and  * . The  second  is  to  use  an  algorithm 
for  solving  a path  expression  problem  to  solve  a particular  path  problem 
by  reinterpreting  (J  > * > and  * within  the  algorithm;  this  avoids  the 
intermediate  step  of  first  constructing  a directed  acyclic  graph 
representing  a set  of  path  expressions.  The  choice  between  these  two 
methods  depends  upon  the  time  and  space  available  and  whether  we  want 
to  solve  one  or  many  path  problems  on  the  same  graph. 

For  path  problems  in  which  the  operation  corresponding  to  + is 
idempotent,  the  non- redundancy  and  uniqueness  conditions  in  (6)  and 
Theorem  1 are  not  necessary  and  can  be  dropped  [30].  In  such  cases  we 
can  use  the  sophisticated  algorithm  of  Tarjan  [29]  to  carry  out  the 
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forest  manipulation  operations  and  achieve  an  0(m  ct(m,n)  + t)  time 
bound  [27].  It  does  not  seem  possible  to  adapt  this  method  to  satisfy 
non-redundancy,  however.  The  only  interesting  path  problem  known  to 
the  author  which  does  not  have  the  idempotent  property  is  the  solution 
of  sparse  systems  of  linear  equations.  For  this  problem  another  form 
of  tree  manipulation  described  by  Tar j an  [29]  gives  a rather  simple 
0(m  a(m,  n)  + t)  -time  algorithm  [28]. 

The  method  of  decomposition  by  dominators  is  a kind  of  single-element 
"tearing”  [5]  in  which  the  clever  use  of  data  structures  allows  us  to 
make  the  combining  step  very  efficient.  The  result  may  be  generalizable 
in  various  directions.  For  instance,  on  problem  graphs  for  which  there 
is  no  natural  start  vertex  we  would  like  to  know  how  to  pick  a start 
vertex  which  gives  the  finest  decomposition.  It  may  also  be  possible 
to  extend  the  technique  to  regions  with  two  or  more  entry  vertices.  We 
leave  these  questions  to  the  ambitious  reader. 
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Appendix;  Graph  Theoretic  Terminology. 


A directed  graph  G = (V,  E)  is  a finite  set  V of  vertices  and  a 
finite  set  E of  edges  such  that  each  edge  e has  a head  h(e)  eV  and 

a tail  t(e)  eV  . We  regard  the  edge  e as  leading  from  h(e)  to  t(e)  , 

and  we  say  the  edge  e leaves  h(e)  and  enters  t(e)  . We  usually 
denote  the  number  of  vertices  by  n and  the  number  of  edges  by  m . 

A loop  is  an  edge  e with  h(e)  = t(e)  . A path  p = e^,  e^, . . . , e^  is 

a sequence  of  edges  such  that  t(e^)  = h(e^+^)  for  1 < i < k-1  . The 
path  is  from  h(p)  = h(e^)  to  t(p)  = t(ek)  • The  path  contains  edges 
el,e2,'**'ek  vertices  h(e1),h(e2), . . ,,h(ek),t(ek)  and  avoids  all 

other  edges  and  vertices.  There  is  a path  of  no  edges  from  any  vertex 
to  itself.  A cycle  is  a non-empty  path  from  a vertex  to  itself.  A graph 
is  acyclic  if  it  contains  no  cycles. 

I* 

The  reverse  G of  a graph  G is  the  graph  formed  by  replacing 
each  edge  e with  an  edge  e such  that  h(e  ) = t(e)  and  t(e  ) = h(e)  . 
If  G^  = E^)  and  G 2 = are  graphs,  G^  is  a subgraph  of 

Gg  if  c V2  and  E^  c E2  . G^  is  the  subgraph  of  G^  induced  by 

V1  if  Vl  £ V2  and  = {eeE2  (h(e),t(e)  eV1]  . 

A vertex  v is  reachable  from  a vertex  w in  a graph  G if  there 
is  a path  frcm  v to  w . G is  strongly  connected  if  every  vertex  is 
reachable  from  every  other  vertex.  The  strong  components  of  G are  its 
maximal  strongly  connected  subgraphs.  These  components  are  uniquely 
defined  and  partition  the  vertices  of  G . 

A flow  graph  G = (V,  E,  r)  is  a graph  with  a distinguished  start 
vertex  r such  that  every  vertex  is  reachable  from  r . A (directed, 
rooted)  tree  T = (V,  E,r)  is  a flow  graph  with  |e|  = |V|-1  . The  start 
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vertex  r is  the  root  of  the  tree.  Any  tree  is  acyclic,  and.  if  v 


is  any  vertex  in  T , there  is  a unique  path  from  r to  v . If  v 

and  w are  vertices  in  a tree  T and  there  is  a path  from  v to  w , 

v is  an  ancestor  of  w and  w is  a descendant  of  v . We  denote 

* 

this  relationship  by  v w . If  in  addition  v ^ w , v is  a proper 
ancestor  of  w and  w is  a proper  descendant  of  v , denoted  by  v X w 
If  there  is  an  edge  from  v to  w , v is  the  parent  of  w and  w is 
a child  of  v , denoted  by  v ->  w . Two  vertices  with  a common  parent 
are  siblings.  In  a tree  each  vertex  has  a unique  parent  (except  the 
root,  which  has  no  parent). 
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